BiRe-ID: Binary Neural Network for Efficient Person Re-ID

155

L F

MSE

wi

= μ(aia

H)ai

wi

I(iL),

(6.23)

where I is an indicator function defined as

I(iL) =



1,

ith layer is supervised with FRGAL

0,

else

.

(6.24)

As mentioned above, we employ several FR-GALs in the training process. Therefore, I(iL)

denotes whether i-th layer is supervised with FR-GAL. Note that FR-GAL is only used to

supervise the low-level feature. Thus, no gradient is aroused to the high-level feature.

In this way, we calculate every specific gradient of wi as

wiwiη1δwi,

(6.25)

where η1 is a learning rate.

Update αi: We further update the learnable matrix αi with wi fixed. Let δαi be the

gradient of αi, we then have

δαi = L

∂αi

= LS

∂αi

+ L K

Adv

∂αi

+ L K

MSE

∂αi

+ L F

Adv

∂αi

+ L F

MSE

∂αi

,

(6.26)

and

αiαiη2δαi,

(6.27)

where η2 is the learning rate for αi. Furthermore,

L K

Adv

∂αi

=

1

(1D(αibwi; WD))

∂D

(αibwi)bwi.

(6.28)

L K

MSE

∂αi

=λ(wiαibwi)bwi,

(6.29)

L F

Adv

∂αi

=

1

1D(ai; WD)

∂D

ai

ai

∂αi

I(iL),

(6.30)

L F

MSE

∂αi

= μ(aia

H)ai

∂αi

I(iL),

(6.31)

Update pi: Finally, we update the other parameters pi with wi and αi fixed. δpi is defined

as the gradient of pi as

δpi = ∂LS

pi

(6.32)

pipiη3δpi,

(6.33)

where η3 is the learning rate for other parameters. These derivations demonstrate that the

refining process can be trained from the beginning to the end. The training process of our

BiRe-ID is summarized in Algorithm 13. We independently update the parameters while

fixing other parameters of convolutional layers to enhance the variation of the feature maps

in every layer. In this way, we can accelerate the convergence of training and fully explore

the potential of our 1-bit networks.